IUT DE CARCASSONNE DE L’UNIVERSITÉ DE PERPIGNAN

Semestre 4, VIGAN JÉROS

""" Created on Mon May 4 12:29:20 2020 sur Spyder (Ipyhton)

@author: Jéros """

=================================================================

THEME :Quelles sont les capacités de résilience des pays face à cette pandémie ?

=================================================================

Problématique

Évaluant les données de ruralité comme le taux de la population rurale et le taux de la terre arable

Évaluant les données économiques comme le revenu national brut (RNB) et la population du monde en 2018

Évaluant la dispersion de la pandémie dans les pays du monde

Évaluant les liens entre les données de ruralité, données économiques et la pandémie

=================================================================

Les modules de travail

=================================================================

Gestion des Workpace, importation et modification des données

In [4]:
import os 
import pandas as pd
import pandas.plotting
from pandas.plotting import scatter_matrix
import numpy as np

Gestion desgraphiques et des modélisations , Dataviz et animation en 3D

In [5]:
import seaborn as sns
import matplotlib.pyplot as plt
import squarify
from matplotlib import animation
from matplotlib.animation import FuncAnimation,FFMpegFileWriter
from mpl_toolkits.mplot3d import Axes3D

Gestion des modéles d'analyses : ACP, Regression et Machine Lernaning

In [6]:
import scipy.stats
from sklearn.preprocessing import scale
from sklearn.linear_model import LinearRegression

Gestion des productions des cartes

In [7]:
import geopandas as gpd
import mapclassify as mc
#import libpysal as lps
# import geoplot as gplt
#import pysal as ps
#from pysal.contrib.viz import mapping as maps

Gestion des cartes interactives et animations

In [8]:
import folium
from folium.plugins import HeatMap
import PIL
import io
from ipywidgets import interact
import ipywidgets as widgets
from IPython.display import display
from IPython.display import Image
from IPython.display import HTML

Gestion de la partie inférentielle

In [9]:
import matplotlib.mlab as mlab
from scipy.stats import norm
from statsmodels.stats.proportion import proportion_confint
import scipy
import statsmodels
from scipy.stats import chi2_contingency
from scipy.stats import ks_2samp
import scipy.stats as stats
import researchpy as rp

=================================================================

Déclaration du dossier de travail

=================================================================

In [10]:
base= r'D:\Navigation\Téléchargements\Cours Distance\Projet COVID\fichier'
base=base.replace('\\','/')
os.chdir(base)

Chargement des fichiers csv en localhost

In [11]:
RNB = pd.read_csv('RNB.csv',sep=',')
Continent =pd.read_csv('continent1.csv',sep = ",")
TabMond =pd.read_csv('tableau-donnes-monde.csv',sep = ";")

Chargement des données depuis le site pour avoir la mis à jour

In [12]:
Deces = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
Infections = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
Guerisions= pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
pandemie = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv')
EvolPand =pd.read_csv('https://www.data.gouv.fr/fr/datasets/r/15a5a5b8-8330-48a0-a385-e01b326d2213',sep = ";" ,skiprows= 3)
EvolMond =pd.read_csv('https://www.data.gouv.fr/fr/datasets/r/f4935ed4-7a88-44e4-8f8a-33910a151d42',sep = ";" ,skiprows= 3)

=================================================================

Vérification des données

=================================================================

Information sur les donnés

In [13]:
print(Continent.info())
print(TabMond.info())
#print(RNB.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 207 entries, 0 to 206
Data columns (total 11 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   dateRep                  207 non-null    object 
 1   day                      207 non-null    int64  
 2   month                    207 non-null    int64  
 3   year                     207 non-null    int64  
 4   cases                    207 non-null    int64  
 5   deaths                   207 non-null    int64  
 6   countriesAndTerritories  207 non-null    object 
 7   geoId                    206 non-null    object 
 8   countryterritoryCode     204 non-null    object 
 9   popData2018              204 non-null    float64
 10  continentExp             206 non-null    object 
dtypes: float64(1), int64(5), object(5)
memory usage: 17.9+ KB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 212 entries, 0 to 211
Columns: 185 entries, AG.AGR.TRAC.NO to Unnamed: 184
dtypes: float64(183), int64(1), object(1)
memory usage: 306.5+ KB
None

Summary

In [14]:
EvolPand.describe()
Out[14]:
Infections Deces Guerisons TauxDeces TauxGuerison TauxInfection
count 1.100000e+02 110.000000 1.100000e+02 110.00000 110.000000 110.000000
mean 1.010260e+06 65613.327273 2.887447e+05 4.43000 25.982636 69.587455
std 1.281523e+06 90597.519623 3.956134e+05 1.77787 14.937614 15.535202
min 5.550000e+02 17.000000 2.800000e+01 2.04000 1.740000 41.270000
25% 7.526375e+04 2035.750000 1.479425e+04 2.92250 19.410000 60.310000
50% 1.894330e+05 7532.000000 7.946400e+04 3.97500 25.440000 68.380000
75% 1.887309e+06 118133.750000 4.419218e+05 6.25750 33.357500 77.642500
max 4.054145e+06 279916.000000 1.388961e+06 7.16000 55.270000 96.190000
In [15]:
Continent.describe() 
Out[15]:
day month year cases deaths popData2018
count 207.000000 207.000000 207.0 207.000000 207.000000 2.040000e+02
mean 5.024155 4.990338 2020.0 370.289855 19.222222 3.700000e+07
std 0.347524 0.139010 0.0 1839.451824 97.522608 1.407163e+08
min 5.000000 3.000000 2020.0 -9.000000 0.000000 1.000000e+03
25% 5.000000 5.000000 2020.0 0.000000 0.000000 1.267305e+06
50% 5.000000 5.000000 2020.0 6.000000 0.000000 7.042862e+06
75% 5.000000 5.000000 2020.0 122.000000 3.000000 2.547777e+07
max 10.000000 5.000000 2020.0 22593.000000 1252.000000 1.392730e+09
In [16]:
TabMond.describe()
Out[16]:
AG.LND.AGRI.K2 AG.LND.AGRI.ZS AG.LND.ARBL.HA.PC AG.LND.ARBL.ZS AG.LND.CREL.HA AG.LND.TRAC.ZS AG.PRD.CREL.MT AG.YLD.CREL.KG BM.GSR.MRCH.CD BX.GSR.MRCH.CD ... SP.POP.TOTL.MA.IN SP.POP.TOTL.MA.ZS SP.RUR.TOTL SP.RUR.TOTL.ZG SP.RUR.TOTL.ZS TM.VAL.AGRI.ZS.UN TM.VAL.SERV.CD.WT TX.VAL.AGRI.ZS.UN TX.VAL.SERV.CD.WT Unnamed: 184
count 1.870000e+02 2.080000e+02 207.000000 205.000000 205.000000 1.800000e+02 184.000000 1.810000e+02 180.000000 1.900000e+02 ... 191.000000 1.910000e+02 191.000000 2.110000e+02 205.000000 211.000000 196.000000 1.900000e+02 196.000000 1.900000e+02
mean 1.407091e+05 2.324006e+05 37.628094 0.191526 13.988169 4.059325e+06 406.552249 1.645110e+07 3550.066667 9.598750e+10 ... 49.875433 1.994623e+07 50.124567 1.602805e+07 -2.699669 39.195730 1.303510 2.765317e+10 4.187293 2.928991e+10
std 4.505034e+05 6.325318e+05 22.348497 0.245527 13.492865 1.217676e+07 785.009183 6.281379e+07 3138.557052 2.800742e+11 ... 3.358085 7.471032e+07 3.358085 7.434845e+07 24.188405 23.722483 0.958939 7.280446e+10 9.453633 8.128629e+10
min 1.000000e+00 3.000000e+00 0.557692 0.000100 0.002193 0.000000e+00 0.092963 0.000000e+00 167.600000 2.147624e+07 ... 24.495287 4.642100e+04 45.464657 0.000000e+00 -235.792446 0.000000 0.033998 1.712497e+07 0.000018 0.000000e+00
25% 3.850000e+02 1.950250e+03 19.570583 0.045936 2.889338 9.407975e+04 25.189658 1.789220e+05 1579.325000 2.461629e+09 ... 49.627770 1.083712e+06 49.035853 2.105085e+05 -0.782101 19.761500 0.618189 7.423167e+08 0.350208 4.748364e+08
50% 6.340000e+03 2.846950e+04 38.528185 0.118129 10.339925 6.938140e+05 115.568852 1.720429e+06 3026.000000 9.742460e+09 ... 50.268047 4.392429e+06 49.731953 2.011297e+06 0.188707 38.166000 1.126254 2.914497e+09 1.240366 2.814755e+09
75% 5.918100e+04 1.631425e+05 54.680000 0.238580 20.039062 2.459170e+06 391.750102 5.949600e+06 4493.050000 5.409540e+10 ... 50.964147 1.428257e+07 50.372230 9.794246e+06 1.229151 57.790000 1.721873 1.554191e+10 3.242023 1.556726e+10
max 4.389812e+06 5.277330e+06 82.559705 1.903525 59.646692 1.024931e+08 5895.227273 6.179303e+08 26110.200000 2.560000e+12 ... 54.535343 7.147578e+08 75.504713 8.923217e+08 3.982898 86.968000 7.066518 5.440000e+11 73.691101 8.060000e+11

8 rows × 184 columns

In [17]:
EvolMond.describe()
Out[17]:
Infections Deces Guerisons TauxDeces TauxGuerison TauxInfection
count 1.354400e+04 13544.000000 13544.000000 13544.000000 13544.000000 13544.000000
mean 8.204685e+03 532.877584 2344.822357 2.873835 22.669224 74.109951
std 5.180009e+04 3543.553818 12344.448487 5.240886 27.261315 28.136596
min 0.000000e+00 0.000000 0.000000 0.000000 0.000000 0.000000
25% 1.500000e+01 0.000000 1.000000 0.000000 0.250000 59.727500
50% 1.360000e+02 2.000000 15.000000 1.050000 11.760000 84.030000
75% 1.271500e+03 25.000000 208.000000 3.930000 34.780000 97.440000
max 1.309550e+06 78795.000000 212534.000000 100.000000 100.000000 100.000000
In [18]:
RNB.describe()
Out[18]:
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 ... 2011 2012 2013 2014 2015 2016 2017 2018 2019 Unnamed: 64
count 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 240.000000 239.000000 239.000000 239.000000 236.000000 236.000000 236.000000 228.000000 0.0 0.0
mean NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 16732.508022 17343.626500 17972.412379 18276.889459 18815.851131 19290.205592 20112.436079 20648.359525 NaN NaN
std NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 18164.638413 18513.127338 19057.745917 19122.558072 19175.040355 19512.834511 20259.523262 21070.013729 NaN NaN
min NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 650.000000 680.000000 700.000000 720.000000 750.000000 740.000000 740.000000 750.000000 NaN NaN
25% NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 3950.000000 4040.000000 4445.000000 4615.000000 5155.000000 5286.383641 5500.288973 5082.500000 NaN NaN
50% NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 10235.000000 11290.000000 11930.000000 12287.825095 12535.000000 13114.607057 13691.301953 13765.000000 NaN NaN
75% NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 22060.000000 23690.000000 23845.000000 24585.000000 25407.500000 26434.448137 27862.500000 28677.500000 NaN NaN
max NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 114210.000000 114710.000000 117910.000000 118080.000000 121090.000000 122670.000000 124300.000000 124410.000000 NaN NaN

8 rows × 61 columns

In [19]:
Deces.describe()
Out[19]:
Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 ... 4/30/20 5/1/20 5/2/20 5/3/20 5/4/20 5/5/20 5/6/20 5/7/20 5/8/20 5/9/20
count 266.000000 266.000000 266.000000 266.000000 266.000000 266.000000 266.000000 266.000000 266.000000 266.000000 ... 266.000000 266.000000 266.000000 266.000000 266.000000 266.000000 266.000000 266.000000 266.000000 266.000000
mean 21.259359 22.432499 0.063910 0.067669 0.097744 0.157895 0.210526 0.308271 0.492481 0.500000 ... 877.293233 897.063910 916.590226 930.338346 945.627820 967.063910 991.936090 1013.409774 1033.451128 1050.041353
std 24.747943 70.478908 1.042337 1.043908 1.473615 2.453621 3.189730 4.660845 7.664297 7.664793 ... 5033.523652 5151.692272 5255.654602 5334.737320 5413.828455 5546.438359 5695.861834 5829.167335 5933.342912 6031.114726
min -51.796300 -135.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 6.907750 -18.093125 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
50% 23.488100 20.921188 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 8.000000 8.000000 8.000000 9.000000 9.000000 9.000000 9.000000 9.000000 10.000000 10.000000
75% 41.143200 77.191525 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 61.750000 67.500000 71.750000 75.250000 78.000000 79.750000 85.750000 88.250000 89.000000 90.750000
max 71.706900 178.065000 17.000000 17.000000 24.000000 40.000000 52.000000 76.000000 125.000000 125.000000 ... 62996.000000 64943.000000 66369.000000 67682.000000 68922.000000 71064.000000 73455.000000 75662.000000 77180.000000 78795.000000

8 rows × 111 columns

In [20]:
pandemie.iloc[:,4:len(pandemie)].describe()
Out[20]:
Confirmed Deaths Recovered Active Incident_Rate People_Tested People_Hospitalized Mortality_Rate UID
count 1.870000e+02 187.000000 187.000000 1.870000e+02 185.000000 0.0 0.0 187.000000 187.000000
mean 2.166541e+04 1495.903743 7409.582888 1.284067e+04 95.010453 NaN NaN 4.079557 519.267380
std 1.029033e+05 7161.049382 25285.333028 7.839853e+04 218.933573 NaN NaN 4.475314 965.604637
min 6.000000e+00 0.000000 0.000000 0.000000e+00 0.089415 NaN NaN 0.000000 4.000000
25% 1.580000e+02 3.500000 45.000000 5.700000e+01 3.417143 NaN NaN 0.960663 206.000000
50% 9.390000e+02 21.000000 433.000000 4.700000e+02 16.358463 NaN NaN 3.012048 418.000000
75% 8.190500e+03 206.500000 2559.000000 3.338000e+03 94.907397 NaN NaN 5.556989 660.500000
max 1.309698e+06 78799.000000 212534.000000 1.033465e+06 1876.952089 NaN NaN 31.250000 9999.000000

Les dimensions de chaque tableau (dataFrame)

In [21]:
print(Continent.shape);
print(EvolMond.shape);
print(TabMond.shape);
print(RNB.shape);
print(pandemie.shape);
print(EvolPand.shape);
(207, 11)
(13544, 8)
(212, 185)
(264, 65)
(187, 14)
(110, 7)

Affichage des 5 premières lignes des données

In [22]:
TabMond.head(5)
Out[22]:
AG.AGR.TRAC.NO AG.LND.AGRI.K2 AG.LND.AGRI.ZS AG.LND.ARBL.HA.PC AG.LND.ARBL.ZS AG.LND.CREL.HA AG.LND.TRAC.ZS AG.PRD.CREL.MT AG.YLD.CREL.KG BM.GSR.MRCH.CD ... SP.POP.TOTL.MA.IN SP.POP.TOTL.MA.ZS SP.RUR.TOTL SP.RUR.TOTL.ZG SP.RUR.TOTL.ZS TM.VAL.AGRI.ZS.UN TM.VAL.SERV.CD.WT TX.VAL.AGRI.ZS.UN TX.VAL.SERV.CD.WT Unnamed: 184
0 ABW NaN 20.000000 11.111111 0.019071 11.111111 NaN NaN NaN NaN ... 52.531036 50244.0 47.468964 59897.0 0.245723 56.589 0.310605 8.227187e+08 0.075850 2.033752e+09
1 AFG 110.0 379100.000000 58.067580 0.218437 11.838679 2418725.0 0.143173 4897143.0 2024.7 ... 48.635847 19093281.0 51.364153 27695286.0 2.056463 74.505 1.895385 1.053675e+09 17.113304 2.528868e+08
2 AGO 8108.0 591900.000000 47.477340 0.169888 3.930376 3196553.0 27.958621 2891266.0 904.5 ... 50.530463 15241447.0 49.469537 10625055.0 1.337729 34.486 0.793311 9.771707e+09 0.028654 6.311245e+08
3 ALB 7438.0 11816.999510 43.127735 0.215674 22.638686 145799.0 121.934426 701734.0 4813.0 ... 49.063095 1460043.0 50.936905 1137407.0 -2.578121 39.681 0.660189 2.193217e+09 0.313849 3.566064e+09
4 AND 353.0 187.800007 39.957448 0.010091 1.659574 NaN 4584.415698 NaN NaN ... NaN NaN NaN 9193.0 0.742443 11.938 0.360355 NaN 1.551347 NaN

5 rows × 185 columns

In [23]:
RNB.head(2)
Out[23]:
Country Name Country Code Indicator Name Indicator Code 1960 1961 1962 1963 1964 1965 ... 2011 2012 2013 2014 2015 2016 2017 2018 2019 Unnamed: 64
0 Aruba ABW RNB par habitant, ($ PPA internationaux courants) NY.GNP.PCAP.PP.CD NaN NaN NaN NaN NaN NaN ... 32060.0 33870.0 35030.0 36740.0 36480.0 36430.0 36960.0 NaN NaN NaN
1 Afghanistan AFG RNB par habitant, ($ PPA internationaux courants) NY.GNP.PCAP.PP.CD NaN NaN NaN NaN NaN NaN ... 1620.0 1810.0 1880.0 1900.0 1900.0 1910.0 1940.0 1970.0 NaN NaN

2 rows × 65 columns

In [24]:
Continent.head(5)
Out[24]:
dateRep day month year cases deaths countriesAndTerritories geoId countryterritoryCode popData2018 continentExp
0 05/05/2020 5 5 2020 190 5 Afghanistan AF AFG 37172386.0 Asia
1 05/05/2020 5 5 2020 8 0 Albania AL ALB 2866376.0 Europe
2 05/05/2020 5 5 2020 174 2 Algeria DZ DZA 42228429.0 Africa
3 05/05/2020 5 5 2020 2 0 Andorra AD AND 77006.0 Europe
4 05/05/2020 5 5 2020 0 0 Angola AO AGO 30809762.0 Africa
In [25]:
EvolMond.head(5)
Out[25]:
Date Pays Infections Deces Guerisons TauxDeces TauxGuerison TauxInfection
0 2020-05-10 Andorre 754 48 545 6.37 72.28 21.35
1 2020-05-10 Émirats Arabes Unis 18198 198 4804 1.09 26.40 72.51
2 2020-05-10 Afghanistan 4402 120 558 2.73 12.68 84.60
3 2020-05-10 Antigua-et-Barbuda 25 3 19 12.00 76.00 12.00
4 2020-05-10 Albanie 868 31 650 3.57 74.88 21.54
In [26]:
pandemie['Ratio']=pandemie['Recovered']/pandemie['Confirmed']
ratio=pandemie['Ratio']
pandemie.head(5)
Out[26]:
Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Incident_Rate People_Tested People_Hospitalized Mortality_Rate UID ISO3 Ratio
0 Australia 2020-05-10 13:32:32 -25.0000 133.0000 6941 97 6163 681 27.262694 NaN NaN 1.397493 36 AUS 0.887912
1 Austria 2020-05-10 13:32:32 47.5162 14.5501 15871 618 13991 1262 176.219133 NaN NaN 3.893895 40 AUT 0.881545
2 Canada 2020-05-10 13:32:32 60.0010 -95.0010 68924 4824 31262 32838 182.070326 NaN NaN 6.999013 124 CAN 0.453572
3 China 2020-05-10 13:32:32 30.5928 114.3055 83994 4637 79144 213 5.979598 NaN NaN 5.520632 156 CHN 0.942258
4 Denmark 2020-05-10 13:32:32 56.0000 10.0000 10627 529 8415 1683 183.470780 NaN NaN 4.977887 208 DNK 0.791851

=================================================================

Traitement des données

=================================================================

Convertion de la colonne date

In [27]:
pandemie.Last_Update=pd.to_datetime(pandemie.Last_Update,format='%Y-%m-%d %H:%M:%S')
EvolPand.Date=pd.to_datetime(EvolPand.Date,format='%Y-%m-%d')
EvolMond.Date=pd.to_datetime(EvolMond.Date,format='%Y-%m-%d')
Continent.dateRep=pd.to_datetime(Continent.dateRep,format='%d/%m/%Y')

EvolPand['Date1']=EvolPand['Date'].apply(lambda x:x.strftime('%Y-%m'))

Permuter l'ordre des colonnes

In [28]:
cols = EvolPand.columns.tolist()
cols = cols[-1:] + cols[:-1]
cols
EvolPand = EvolPand[cols]

EvolPand= EvolPand.sort_values('Infections',ascending=False)

Extration des données et fusion des données

In [29]:
pandemie.rename(columns={'Country_Region':'Pays','Long_':'Long','Confirmed':'Infections','Deaths':'Decedes','Recovered':'Guerisions','ISO3':'Code'},inplace=True)
pandemie.columns
Out[29]:
Index(['Pays', 'Last_Update', 'Lat', 'Long', 'Infections', 'Decedes',
       'Guerisions', 'Active', 'Incident_Rate', 'People_Tested',
       'People_Hospitalized', 'Mortality_Rate', 'UID', 'Code', 'Ratio'],
      dtype='object')
In [30]:
RNB2014=RNB[['Country Code','2014']]
RNB2014.rename(columns={'Country Code':'Code','2014':'RNB'},inplace=True)
RNB2014.columns
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py:4133: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
Out[30]:
Index(['Code', 'RNB'], dtype='object')
In [31]:
donnee=pd.merge(pandemie[['Pays','Lat','Long','Infections','Decedes','Guerisions','Code']],RNB2014, how='left', on='Code')
donnee.columns
Out[31]:
Index(['Pays', 'Lat', 'Long', 'Infections', 'Decedes', 'Guerisions', 'Code',
       'RNB'],
      dtype='object')
In [32]:
Continent.rename(columns={'countryterritoryCode':'Code','popData2018':'Pop2018','continentExp':'Continent'},inplace=True)
Continent.columns
Out[32]:
Index(['dateRep', 'day', 'month', 'year', 'cases', 'deaths',
       'countriesAndTerritories', 'geoId', 'Code', 'Pop2018', 'Continent'],
      dtype='object')
In [33]:
donnee=pd.merge(donnee,Continent[['Code','Pop2018', 'Continent']], how='left', on='Code')
donnee.columns
Out[33]:
Index(['Pays', 'Lat', 'Long', 'Infections', 'Decedes', 'Guerisions', 'Code',
       'RNB', 'Pop2018', 'Continent'],
      dtype='object')
In [34]:
TabMond.rename(columns={'AG.AGR.TRAC.NO':'Code','SP.RUR.TOTL.ZS':'TauxPopRural','AG.LND.ARBL.ZS':'TauxSurfRural'},inplace=True)
TabMond.columns
Out[34]:
Index(['Code', 'AG.LND.AGRI.K2', 'AG.LND.AGRI.ZS', 'AG.LND.ARBL.HA.PC',
       'TauxSurfRural', 'AG.LND.CREL.HA', 'AG.LND.TRAC.ZS', 'AG.PRD.CREL.MT',
       'AG.YLD.CREL.KG', 'BM.GSR.MRCH.CD',
       ...
       'SP.POP.TOTL.MA.IN', 'SP.POP.TOTL.MA.ZS', 'SP.RUR.TOTL',
       'SP.RUR.TOTL.ZG', 'TauxPopRural', 'TM.VAL.AGRI.ZS.UN',
       'TM.VAL.SERV.CD.WT', 'TX.VAL.AGRI.ZS.UN', 'TX.VAL.SERV.CD.WT',
       'Unnamed: 184'],
      dtype='object', length=185)
In [35]:
donnee=pd.merge(donnee,TabMond[['Code','TauxPopRural', 'TauxSurfRural']], how='left', on='Code')
donnee.columns
Out[35]:
Index(['Pays', 'Lat', 'Long', 'Infections', 'Decedes', 'Guerisions', 'Code',
       'RNB', 'Pop2018', 'Continent', 'TauxPopRural', 'TauxSurfRural'],
      dtype='object')

Permutation des colones

In [36]:
donnee = donnee[['Pays', 'Code','Continent','Lat', 'Long', 'Infections','Decedes','Guerisions','RNB','Pop2018','TauxPopRural','TauxSurfRural']]
In [37]:
list(donnee.columns.values)
Out[37]:
['Pays',
 'Code',
 'Continent',
 'Lat',
 'Long',
 'Infections',
 'Decedes',
 'Guerisions',
 'RNB',
 'Pop2018',
 'TauxPopRural',
 'TauxSurfRural']

Estimations des taux des variables

In [38]:
donnee['TauxInfections']=donnee['Infections'].apply(lambda x:(x/donnee['Pop2018'].sum(axis=0))*100000)
donnee['TauxDecedes']=donnee['Decedes'].apply(lambda x:(x/donnee['Pop2018'].sum(axis=0))*100000)
donnee['TauxGuerisions']=donnee['Guerisions'].apply(lambda x:(x/donnee['Pop2018'].sum(axis=0))*100000)
donnee['TauxRNB']=donnee['RNB'].apply(lambda x:(x/donnee['Pop2018'].sum(axis=0))*100000)
donnee['TauxPop2018']=donnee['Pop2018'].apply(lambda x:(x/donnee['Pop2018'].sum(axis=0))*100000)

donnee.columns
Out[38]:
Index(['Pays', 'Code', 'Continent', 'Lat', 'Long', 'Infections', 'Decedes',
       'Guerisions', 'RNB', 'Pop2018', 'TauxPopRural', 'TauxSurfRural',
       'TauxInfections', 'TauxDecedes', 'TauxGuerisions', 'TauxRNB',
       'TauxPop2018'],
      dtype='object')

Summary

In [39]:
donnee.iloc[:,5:len(donnee)].describe()
Out[39]:
Infections Decedes Guerisions RNB Pop2018 TauxPopRural TauxSurfRural TauxInfections TauxDecedes TauxGuerisions TauxRNB TauxPop2018
count 1.910000e+02 191.000000 191.000000 172.000000 1.840000e+02 178.000000 178.000000 191.000000 191.000000 191.000000 172.000000 184.000000
mean 2.121923e+04 1464.732984 7261.162304 18639.941860 4.098466e+07 -1.067001 0.211357 0.281378 0.019423 0.096287 0.247176 543.478261
std 1.018603e+05 7088.490590 25038.481527 19414.113498 1.476558e+08 17.754981 0.255741 1.350721 0.093997 0.332024 0.257441 1957.993664
min 6.000000e+00 0.000000 0.000000 720.000000 1.000000e+03 -235.792446 0.000100 0.000080 0.000000 0.000000 0.009548 0.013261
25% 1.520000e+02 3.000000 37.000000 4617.500000 2.406217e+06 -0.687439 0.070381 0.002016 0.000040 0.000491 0.061230 31.907709
50% 8.920000e+02 20.000000 433.000000 11780.000000 9.609240e+06 0.280972 0.140026 0.011828 0.000265 0.005742 0.156209 127.423619
75% 8.097000e+03 193.000000 2547.500000 25227.500000 2.956375e+07 1.370222 0.252941 0.107371 0.002559 0.033781 0.334530 392.030966
max 1.309698e+06 78799.000000 212534.000000 117750.000000 1.392730e+09 3.982898 1.903525 17.367289 1.044916 2.818313 1.561427 18468.337587

Completer la table Continent pour faire les cartes

In [40]:
Continent=pd.merge(Continent,donnee[['Code','TauxPopRural', 'TauxSurfRural','TauxInfections', 'TauxDecedes', 'TauxGuerisions', 'TauxRNB','TauxPop2018']], how='left', on='Code')
Continent=Continent[['dateRep', 'day', 'month', 'year', 'cases', 'deaths','countriesAndTerritories', 'geoId', 'Code','Continent','cases', 'deaths','TauxPopRural', 'TauxSurfRural', 'TauxInfections', 'TauxDecedes','TauxGuerisions', 'TauxRNB', 'TauxPop2018']]
Continent.columns
Out[40]:
Index(['dateRep', 'day', 'month', 'year', 'cases', 'deaths',
       'countriesAndTerritories', 'geoId', 'Code', 'Continent', 'cases',
       'deaths', 'TauxPopRural', 'TauxSurfRural', 'TauxInfections',
       'TauxDecedes', 'TauxGuerisions', 'TauxRNB', 'TauxPop2018'],
      dtype='object')

=============================================================================

Analyse univariée

=============================================================================

In [41]:
EvolPand.iloc[:,[2,3,4]].describe()
print("-"*20)
print("moyenne:\n",EvolPand['Infections'].mean())
print("mediane:\n",EvolPand['Infections'].median())
print("variance:\n",EvolPand['Infections'].var(ddof=0))
print("std:\n",EvolPand['Infections'].std(ddof=0))
print("skweness:\n",EvolPand['Infections'].skew())
print("kurtosis:\n",EvolPand['Infections'].kurtosis())
   
--------------------
moyenne:
 1010259.7727272727
mediane:
 189433.0
variance:
 1627371227736.03
std:
 1275684.6113895196
skweness:
 1.0692209329223974
kurtosis:
 -0.31651243511033034

Évolution de la pandemie dans le monde (avec le module pandas)

In [42]:
sns.set()
EvolPand.plot.scatter(x='Date',y='Infections',label='Evolution des infections dans le monde',c='black')
plt.savefig('EvolutionInfecNu.png')
EvolPand.plot.scatter(x='Date',y='Guerisons',label='Evolution des guérisions dans le monde',c='green')
plt.savefig('EvolutGuerisNU.png')
EvolPand.plot.scatter(x='Date',y='Deces',label='Evolution des décès dans le monde',c='red')
plt.grid(True)
plt.savefig('EvolutDeceNu.png')

Histogramme l'évolution de Coronavirs dans le monde (mise à jour depuis le site des données)

In [43]:
plt.style.use('seaborn-talk')
EvolPand['Infections'].plot.hist()
EvolPand['Guerisons'].plot.hist()
EvolPand['Deces'].plot.hist()
plt.title('Évolution de CORONAVIRUS dans le monde')
plt.legend()
plt.grid(True)
plt.savefig('EvolutionsCorna.png')

Évolutions des allures de Coronavirus dans le monde

In [44]:
EvolPand=EvolPand.sort_values('Date',ascending=True)
sns.set()
plt.style.use('seaborn-talk')
plt.plot(EvolPand['Date'],EvolPand['Infections'],label='Cas confirmés',color='black')
plt.plot(EvolPand['Date'],EvolPand['Deces'],label='Cas décédés',color='red')
plt.plot(EvolPand['Date'],EvolPand['Guerisons'],label='Cas guerris',color='green')
plt.xlabel('Date')
plt.ylabel('nombre de cas dans le monde')
plt.title('Évolution de Coronavirus dans le monde en date de'+' '+str(EvolPand ['Date'].max()),fontsize=20 )
plt.legend()
plt.show()
plt.savefig('evolution.png')
<Figure size 748.8x514.8 with 0 Axes>

Visualisation matricielle

In [45]:
EvolPand.info()
pandas.plotting.scatter_matrix(EvolPand.select_dtypes(exclude=['object','datetime64[ns]','float64']))
plt.savefig('MatrixPlotEvoCorno.png')
<class 'pandas.core.frame.DataFrame'>
Int64Index: 110 entries, 109 to 0
Data columns (total 8 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Date1          110 non-null    object        
 1   Date           110 non-null    datetime64[ns]
 2   Infections     110 non-null    int64         
 3   Deces          110 non-null    int64         
 4   Guerisons      110 non-null    int64         
 5   TauxDeces      110 non-null    float64       
 6   TauxGuerison   110 non-null    float64       
 7   TauxInfection  110 non-null    float64       
dtypes: datetime64[ns](1), float64(3), int64(3), object(1)
memory usage: 7.7+ KB

Normalité des données

On peut tester l’adéquation de l'infection à une loi normale à l’aide de test de normalité d'Agostino:

In [46]:
stats.normaltest(donnee['Infections'])
Out[46]:
NormaltestResult(statistic=376.47315932737536, pvalue=1.777837897764084e-82)

On peut donc rejetter l’hypothèse de normalité au niveau de test 5%.

On peut tester l’adéquation de l'infection à une loi normale à l’aide de test de de Kolmogorov-Smirnov:

In [47]:
ks_2samp(donnee['Infections'],list(np.random.normal(np.mean(donnee['Infections']), np.std(donnee['Infections']), 1000)))
Out[47]:
Ks_2sampResult(statistic=0.388, pvalue=1.7763568394002505e-15)

On peut donc rejetter l’hypothèse de normalité au niveau de test 5%.

In [48]:
ks_2samp(donnee['Decedes'],list(np.random.normal(np.mean(donnee['Decedes']), np.std(donnee['Decedes']), 1000)))
Out[48]:
Ks_2sampResult(statistic=0.4153455497382199, pvalue=1.7763568394002505e-15)
In [49]:
ks_2samp(donnee['Guerisions'],list(np.random.normal(np.mean(donnee['Guerisions']), np.std(donnee['Guerisions']), 1000)))
Out[49]:
Ks_2sampResult(statistic=0.41110994764397907, pvalue=1.7763568394002505e-15)
In [50]:
ks_2samp(donnee['RNB'],list(np.random.normal(np.mean(donnee['RNB']), np.std(donnee['RNB']), 1000)))
Out[50]:
Ks_2sampResult(statistic=0.184, pvalue=3.154394356330581e-05)
In [51]:
ks_2samp(donnee['Pop2018'],list(np.random.normal(np.mean(donnee['Pop2018']), np.std(donnee['Pop2018']), 1000)))
Out[51]:
Ks_2sampResult(statistic=0.396, pvalue=1.7763568394002505e-15)
In [52]:
ks_2samp(donnee['TauxPopRural'],list(np.random.normal(np.mean(donnee['TauxPopRural']), np.std(donnee['TauxPopRural']), 1000)))
Out[52]:
Ks_2sampResult(statistic=0.41905759162303663, pvalue=1.7763568394002505e-15)
In [53]:
ks_2samp(donnee['TauxSurfRural'],list(np.random.normal(np.mean(donnee['TauxSurfRural']), np.std(donnee['TauxSurfRural']), 1000)))
Out[53]:
Ks_2sampResult(statistic=0.218, pvalue=3.593231369114491e-07)

Dans l''ensemble, On peut donc rejetter l’hypothèse de normalité au niveau de test 5%.

=================================================================

Analyse bivariée

=================================================================

Boxplot par continent des infections de Coronavirus

In [54]:
# plt.figure(figsize=(8,6), dpi=80)
sns.set()
donnee.dropna().boxplot(column='Infections',by='Continent')
#plt.legend()
plt.savefig('InfectioContinet.png')

# plt.figure(figsize=(8,6), dpi=80)
sns.set()
donnee.dropna().boxplot(column='Decedes',by='Continent')
#plt.legend()
plt.savefig('DecedesContinet.png')

# plt.figure(figsize=(8,6), dpi=80)
sns.set()
donnee.dropna().boxplot(column='Guerisions',by='Continent')
#plt.legend()
plt.savefig('GuerisContinet.png')

On peut tester l'égalité des variances à l’aide de test de levene:

H0 : égalité des variances contre H1 : pas d’égalité des variances

In [55]:
stats.levene(donnee['Infections'],donnee['Decedes'])
Out[55]:
LeveneResult(statistic=6.966810368969871, pvalue=0.008645396604345278)

La pvalue obtenue est inférieure à 5 %, donc test significatif, rejet de H0, pas d'égalité des variances au niveau 5 %.

On peut tester l'égalité des moyennes à l’aide de t.test:

H0 : égalité des moyennes Contre H1 : pas d’égalité des moyennes

In [56]:
stats.wilcoxon(donnee['Infections'],donnee['Decedes'])
Out[56]:
WilcoxonResult(statistic=0.0, pvalue=4.2909383161843184e-33)

La pvalue obtenue est inférieure à 5 %, donc test significatif, rejet de H0, pas d'égalité des moyennes au niveau 5 %.

les statistiques au niveau monde

In [57]:
donnee= donnee.sort_values('Infections',ascending=False)

colM1=['Infections','Decedes','Guerisions','RNB','Pop2018']
colM2=['TauxInfections','TauxDecedes','TauxGuerisions','TauxRNB','TauxPop2018','TauxPopRural','TauxSurfRural']
In [58]:
donnee[colM1].sum()
Out[58]:
Infections    4.052873e+06
Decedes       2.797640e+05
Guerisions    1.386882e+06
RNB           3.206070e+06
Pop2018       7.541177e+09
dtype: float64

Matrice de corrélation et Significativité entre les variables

In [59]:
donnee[colM1].corr()
Out[59]:
Infections Decedes Guerisions RNB Pop2018
Infections 1.000000 0.933925 0.808598 0.240101 0.232437
Decedes 0.933925 1.000000 0.788086 0.248506 0.186759
Guerisions 0.808598 0.788086 1.000000 0.258184 0.323859
RNB 0.240101 0.248506 0.258184 1.000000 -0.052541
Pop2018 0.232437 0.186759 0.323859 -0.052541 1.000000

Autre méthode pour obtenir 3 tableaux

In [60]:
corr_type, corr_matrix, corr_ps = rp.corr_case(donnee[colM1].dropna())
corr_ps
Out[60]:
Infections Decedes Guerisions RNB Pop2018
Infections 0.0000 0.0000 0.0000 0.0015 0.0024
Decedes 0.0000 0.0000 0.0000 0.0010 0.0157
Guerisions 0.0000 0.0000 0.0000 0.0006 0.0000
RNB 0.0015 0.0010 0.0006 0.0000 0.4936
Pop2018 0.0024 0.0157 0.0000 0.4936 0.0000

On peut donc rejetter l’hypothèse d'indépendance au niveau de test 5% sauf RNB est non corrélé significativement à Pop2018 au niveau 5%.

In [61]:
rp.corr_pair(donnee[colM1].dropna())
Out[61]:
r value p-value N
Infections & Decedes 0.9336 0.0000 172
Infections & Guerisions 0.8076 0.0000 172
Infections & RNB 0.2401 0.0015 172
Infections & Pop2018 0.2298 0.0024 172
Decedes & Guerisions 0.7870 0.0000 172
Decedes & RNB 0.2485 0.0010 172
Decedes & Pop2018 0.1839 0.0157 172
Guerisions & RNB 0.2582 0.0006 172
Guerisions & Pop2018 0.3207 0.0000 172
RNB & Pop2018 -0.0525 0.4936 172

Corrélation et significativité au 5% pour toutes les variables

In [62]:
donnee[colM2].corr()
Out[62]:
TauxInfections TauxDecedes TauxGuerisions TauxRNB TauxPop2018 TauxPopRural TauxSurfRural
TauxInfections 1.000000 0.933925 0.808598 0.240101 0.232437 -0.000211 0.103007
TauxDecedes 0.933925 1.000000 0.788086 0.248506 0.186759 0.002871 0.062544
TauxGuerisions 0.808598 0.788086 1.000000 0.258184 0.323859 0.001317 0.082987
TauxRNB 0.240101 0.248506 0.258184 1.000000 -0.052541 -0.299532 0.007525
TauxPop2018 0.232437 0.186759 0.323859 -0.052541 1.000000 0.011713 -0.022112
TauxPopRural -0.000211 0.002871 0.001317 -0.299532 0.011713 1.000000 0.061880
TauxSurfRural 0.103007 0.062544 0.082987 0.007525 -0.022112 0.061880 1.000000
In [63]:
corr_type, corr_matrix, corr_ps = rp.corr_case(donnee[colM2].dropna())
corr_type
Out[63]:
Pearson correlation test using list-wise deletion
0 Total observations used = 169
In [64]:
corr_matrix
Out[64]:
TauxInfections TauxDecedes TauxGuerisions TauxRNB TauxPop2018 TauxPopRural TauxSurfRural
TauxInfections 1 0.9337 0.8076 0.2474 0.2296 0.0003 0.1003
TauxDecedes 0.9337 1 0.7868 0.2606 0.1835 0.0034 0.0584
TauxGuerisions 0.8076 0.7868 1 0.2699 0.3203 0.0021 0.0777
TauxRNB 0.2474 0.2606 0.2699 1 -0.0507 -0.2993 0.0249
TauxPop2018 0.2296 0.1835 0.3203 -0.0507 1 0.012 -0.0285
TauxPopRural 0.0003 0.0034 0.0021 -0.2993 0.012 1 0.0633
TauxSurfRural 0.1003 0.0584 0.0777 0.0249 -0.0285 0.0633 1
In [65]:
corr_ps
Out[65]:
TauxInfections TauxDecedes TauxGuerisions TauxRNB TauxPop2018 TauxPopRural TauxSurfRural
TauxInfections 0.0000 0.0000 0.0000 0.0012 0.0027 0.9965 0.1944
TauxDecedes 0.0000 0.0000 0.0000 0.0006 0.0170 0.9646 0.4511
TauxGuerisions 0.0000 0.0000 0.0000 0.0004 0.0000 0.9786 0.3154
TauxRNB 0.0012 0.0006 0.0004 0.0000 0.5127 0.0001 0.7482
TauxPop2018 0.0027 0.0170 0.0000 0.5127 0.0000 0.8769 0.7131
TauxPopRural 0.9965 0.9646 0.9786 0.0001 0.8769 0.0000 0.4138
TauxSurfRural 0.1944 0.4511 0.3154 0.7482 0.7131 0.4138 0.0000
In [66]:
rp.corr_pair(donnee[colM2].dropna())
Out[66]:
r value p-value N
TauxInfections & TauxDecedes 0.9337 0.0000 169
TauxInfections & TauxGuerisions 0.8076 0.0000 169
TauxInfections & TauxRNB 0.2474 0.0012 169
TauxInfections & TauxPop2018 0.2296 0.0027 169
TauxInfections & TauxPopRural 0.0003 0.9965 169
TauxInfections & TauxSurfRural 0.1003 0.1944 169
TauxDecedes & TauxGuerisions 0.7868 0.0000 169
TauxDecedes & TauxRNB 0.2606 0.0006 169
TauxDecedes & TauxPop2018 0.1835 0.0170 169
TauxDecedes & TauxPopRural 0.0034 0.9646 169
TauxDecedes & TauxSurfRural 0.0584 0.4511 169
TauxGuerisions & TauxRNB 0.2699 0.0004 169
TauxGuerisions & TauxPop2018 0.3203 0.0000 169
TauxGuerisions & TauxPopRural 0.0021 0.9786 169
TauxGuerisions & TauxSurfRural 0.0777 0.3154 169
TauxRNB & TauxPop2018 -0.0507 0.5127 169
TauxRNB & TauxPopRural -0.2993 0.0001 169
TauxRNB & TauxSurfRural 0.0249 0.7482 169
TauxPop2018 & TauxPopRural 0.0120 0.8769 169
TauxPop2018 & TauxSurfRural -0.0285 0.7131 169
TauxPopRural & TauxSurfRural 0.0633 0.4138 169

On peut donc rejetter l’hypothèse d'indépendance au niveau de test 5% sauf RNB,Pop2018,TauxPopRural et TauxSurfRural sont non corrélé significativement entre elles au niveau 5%.

Visualisation des influences entre variables

In [67]:
sns.pairplot(donnee[['Continent' ,'Infections','Decedes','Guerisions','RNB','Pop2018']].dropna(), hue="Continent", markers=["o", "s", "D","d","x"])
Out[67]:
<seaborn.axisgrid.PairGrid at 0x1db2cc48488>
In [68]:
sns.pairplot(donnee[['Continent' ,'TauxInfections','TauxDecedes','TauxGuerisions','TauxRNB','TauxPop2018','TauxPopRural','TauxSurfRural']].dropna(), hue="Continent", markers=["o", "s", "D","d","x"])
Out[68]:
<seaborn.axisgrid.PairGrid at 0x1db2d45d108>

Dataviz

In [69]:
sns.pairplot(donnee[['Continent' ,'TauxInfections','TauxDecedes','TauxGuerisions','TauxRNB','TauxPop2018','TauxPopRural','TauxSurfRural']].dropna(), hue = 'Continent', diag_kind = 'kde',plot_kws = {'alpha': 0.6, 's': 80, 'edgecolor': 'k'},size = 2)
plt.suptitle('Dispersion de CORONAVIRUS entre Continent')
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\axisgrid.py:2079: UserWarning: The `size` parameter has been renamed to `height`; please update your code.
  warnings.warn(msg, UserWarning)
Out[69]:
Text(0.5, 0.98, 'Dispersion de CORONAVIRUS entre Continent')

Dispersion de la Pandémie sur l'EUROPE

In [70]:
grid = sns.PairGrid(data= donnee[donnee['Continent']=='Europe'].dropna(), vars = ['TauxInfections', 'TauxDecedes', 'TauxGuerisions','TauxRNB','TauxPop2018','TauxPopRural','TauxSurfRural'])
grid = grid.map_upper(plt.scatter, color = 'darkred')
grid = grid.map_diag(plt.hist, bins = 10, color = 'darkred',edgecolor = 'k')
grid = grid.map_lower(sns.kdeplot, cmap = 'Reds')
plt.suptitle('Dispersion de CORONAVIRUS en EUROPE', size = 15)
Out[70]:
Text(0.5, 0.98, 'Dispersion de CORONAVIRUS en EUROPE')

Dispersion de la Pandémie sur l'EUROPE (avec correlation)

In [71]:
def corr(x, y, **kwargs):
    
    coef = np.corrcoef(x, y)[0][1]
    label = r'$\rho$ = ' + str(round(coef, 2))
    ax = plt.gca()
    ax.annotate(label, xy = (0.2, 0.95), size = 20, xycoords = ax.transAxes)
    
grid = sns.PairGrid(data= donnee[donnee['Continent']=='Europe'].dropna(), vars = ['TauxInfections', 'TauxDecedes', 'TauxGuerisions','TauxRNB','TauxPop2018','TauxPopRural','TauxSurfRural'])
grid = grid.map_upper(plt.scatter, color = 'darkred')
grid = grid.map_upper(corr)
grid = grid.map_lower(sns.kdeplot, cmap = 'Reds')
grid = grid.map_diag(plt.hist, bins = 10, edgecolor =  'k', color = 'darkred');
plt.suptitle('Dispersion de CORONAVIRUS en EUROPE', size = 15)
Out[71]:
Text(0.5, 0.98, 'Dispersion de CORONAVIRUS en EUROPE')

=================================================================

Data visualisation (apprentissage)

=================================================================

Visualisation & corrélation

In [72]:
corr=donnee[colM1].corr()
corr1=donnee[colM2].corr()
In [73]:
ax =sns.heatmap(corr,xticklabels=corr.columns,yticklabels=corr.columns)
plt.title('Matrice de Corrélation', fontsize = 20)
#plt.xlabel('Vari', fontsize = 15)
#plt.ylabel('Vari', fontsize = 15)
plt.show()
In [74]:
ax =sns.heatmap(corr1,xticklabels=corr1.columns,yticklabels=corr1.columns)
plt.title('Matrice de Corrélation', fontsize = 20)
#plt.xlabel('Vari', fontsize = 15)
#plt.ylabel('Vari', fontsize = 15)
plt.show()

Visualisation en 3D

In [78]:
donnee['Continent']=pd.Categorical(donnee['Continent'])
my_color=donnee['Continent'].cat.codes
sns.set_style("white")
fig = plt.figure()
fig.subplots_adjust(left=0, bottom=0, right=1, top=1)
ax = fig.add_subplot(111, projection='3d')
#ax.set_facecolor((0.5, 0.5, 0.5))
ax.scatter(donnee['Infections'],donnee['Decedes'],donnee['Guerisions'],c=my_color,alpha=0.8 ,cmap="Set2_r", s=60)
xAxisLine = ((min(donnee['Infections']), max(donnee['Infections'])), (0, 0), (0,0))
ax.plot(xAxisLine[0], xAxisLine[1], xAxisLine[2], 'r')
yAxisLine = ((0, 0), (min(donnee['Decedes']), max(donnee['Decedes'])), (0,0))
ax.plot(yAxisLine[0], yAxisLine[1], yAxisLine[2], 'r')
zAxisLine = ((0, 0), (0,0), (min(donnee['Guerisions']), max(donnee['Guerisions'])))
ax.plot(zAxisLine[0], zAxisLine[1], zAxisLine[2], 'r')
ax.set_xlabel("Infections")
ax.set_ylabel("Decedes")
ax.set_zlabel("Guerisions")
ax.set_title("Evolution de CORONAVIRUS dans le monde")
#plt.axis('off')
# plot.show()
plt.close()
#3D en animation
def update(i, fig, ax):
    ax.view_init(elev=20., azim=i)
    return fig, ax
 
anim = FuncAnimation(fig, update, frames=np.arange(0, 360, 2), repeat=True, fargs=(fig, ax))
anim.save('evolution3D.gif', dpi=80, writer='imagemagick', fps=24)
HTML(anim.to_html5_video())
MovieWriter imagemagick unavailable; trying to use <class 'matplotlib.animation.PillowWriter'> instead.
Out[78]:

Treemap : Comparaison à la moyenne selon les variables en fonction des pays

In [76]:
color_list = ['#0f7216', '#b2790c', '#ffe9a3','#f9d4d4', '#d35158', '#ea3033']

plt.rc('font', size=10)
squarify.plot(sizes=donnee[donnee['Infections']>donnee['Infections'].mean()]['Infections'], label=donnee['Code'], alpha=.8,color=color_list)
plt.axis('off')
plt.title("Infections Superieures à la Moyenne par Pays",fontsize=12,fontweight="bold")
plt.show()

plt.rc('font', size=10)
squarify.plot(sizes=donnee[donnee['Decedes']>donnee['Decedes'].mean()]['Decedes'], label=donnee['Code'], alpha=.8,color=color_list )
plt.axis('off')
plt.title("Décès Superieurs à la Moyenne par Pays",fontsize=12,fontweight="bold")
plt.show()

plt.rc('font', size=10)
squarify.plot(sizes=donnee[donnee['Guerisions']>donnee['Guerisions'].mean()]['Guerisions'], label=donnee['Code'], alpha=.8,color=color_list )
plt.axis('off')
plt.title("Guérisions Superieures à la Moyenne par Pays",fontsize=12,fontweight="bold")
plt.show()

Customer Heatmap

In [79]:
def heatmap1(x, y, size):
    fig, ax = plt.subplots()
    
    # Mapping from column names to integer coordinates
    x_labels = [v for v in sorted(x.unique())]
    y_labels = [v for v in sorted(y.unique())]
    x_to_num = {p[1]:p[0] for p in enumerate(x_labels)} 
    y_to_num = {p[1]:p[0] for p in enumerate(y_labels)} 
    
    size_scale = 500
    ax.scatter(
        x=x.map(x_to_num), 
        y=y.map(y_to_num), 
        s=size * size_scale,
        marker='s' 
    )
    
    
    ax.set_xticks([x_to_num[v] for v in x_labels])
    ax.set_xticklabels(x_labels, rotation=45, horizontalalignment='right')
    ax.set_yticks([y_to_num[v] for v in y_labels])
    ax.set_yticklabels(y_labels)
    
corr2 = pd.melt(corr1.reset_index(), id_vars='index')
corr2.columns = ['x', 'y', 'value']
sns.set()
heatmap1(x=corr2['x'],y=corr2['y'],size=corr2['value'].abs())
ax.grid(False, 'major')
ax.grid(True, 'minor')
ax.set_xticks([t + 0.5 for t in ax.get_xticks()], minor=True)
ax.set_yticks([t + 0.5 for t in ax.get_yticks()], minor=True)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-79-03bd39076113> in <module>
     26 sns.set()
     27 heatmap1(x=corr2['x'],y=corr2['y'],size=corr2['value'].abs())
---> 28 ax.grid(False, 'major')
     29 ax.grid(True, 'minor')
     30 ax.set_xticks([t + 0.5 for t in ax.get_xticks()], minor=True)

TypeError: grid() takes from 1 to 2 positional arguments but 3 were given

=================================================================

Carte du monde interactive

=================================================================

Visualisation des pays touchées par le CORONAVIRUS

In [80]:
mpCororna = folium.Map(location=[10,10],zoom_start=1.5,tiles='Stamen Toner')
HeatMap(donnee[['Lat','Long']].dropna(), radius=16).add_to(mpCororna)
mpCororna.save('corona_mapa.html')
#allez visualizer l'image en cliquant sur le fichier HTML dans le wokspace
display(mpCororna)

Classification des pays les plus inféctés et le nombre de décès journaliers

In [81]:
mpInfections=folium.Map(location=[42,5],zoom_start=6, max_zoom=5,min_zoom=2)
for i in range(0,len(Infections)):
  folium.Circle(
      location=[Infections.iloc[i]['Lat'],Infections.iloc[i]['Long']],
      fill=True,
      radius=(int((np.log(Infections.iloc[i,-1]+1.00001)))+0.2)*25000, #reduire la taille des cercles
      color='red',
      fill_color='indigo',
      tooltip = "<div style='margin: 0; background-color: black; color: white;'>"+
                    "<h4 style='text-align:center;font-weight: bold'>"+Infections.iloc[i]['Country/Region'] + "</h4>"
                    "<hr style='margin:10px;color: white;'>"+
                    "<ul style='color: white;;list-style-type:circle;align-item:left;padding-left:20px;padding-right:20px'>"+
                        "<li>Infections: "+str(Infections.iloc[i,-1])+"</li>"+
                        "<li>Deces d'aujourd'hui:   "+str(Deces.iloc[i,-1])+"</li>"+
                        "<li>Taux de mortalite d'aujourd'hui: "+ str(np.round(Deces.iloc[i,-1]/(Infections.iloc[i,-1]+1.00001)*100,2))+ "</li>"+
                    "</ul></div>",
        ).add_to(mpInfections)
mpInfections.save("infections.html")

#allez visualizer l'image en cliquant sur le fichier HTML dans le wokspace

display(mpInfections)

Classification des pays les plus retablis et le monde de décès journaliers

In [82]:
mpGuerision=folium.Map(location=[46,2],zoom_start=6, max_zoom=5,min_zoom=2,tiles='Stamen Toner')
for i in range(0,len(Guerisions)):
  folium.Circle(
      location=[Guerisions.iloc[i]['Lat'],Guerisions.iloc[i]['Long']],
      fill=True,
      radius=(int((np.log(Guerisions.iloc[i,-1]+1.00001)))+0.2)*25000,#reduire la taille des cercles
      color='green',
      fill_color='green',
      legend_name='Guerisons',
      tooltip = "<div style='margin: 0; background-color: black; color: white;'>"+
                    "<h4 style='text-align:center;font-weight: bold'>"+Guerisions.iloc[i]['Country/Region'] + "</h4>"
                    "<hr style='margin:10px;color: white;'>"+
                    "<ul style='color: white;;list-style-type:circle;align-item:left;padding-left:20px;padding-right:20px'>"+
                        "<li>Guerisons: "+str(Guerisions.iloc[i,-1])+"</li>"+
                        "<li>Deces d'aujourd'hui:   "+str(Deces.iloc[i,-1])+"</li>"+
                        "<li>Taux de mortalite d'aujourd'hui: "+ str(np.round(Deces.iloc[i,-1]/(Infections.iloc[i,-1]+1.00001)*100,2))+ "</li>"+
                    "</ul></div>",
        ).add_to(mpGuerision)
mpGuerision.save("guerisions.html")

#allez visualizer l'image en cliquant sur le fichier HTML dans le wokspace
display(mpGuerision)

Carte avec Ratio en curseur

=============================================================================

carte du monde

=============================================================================

Importation de la carte

In [84]:
Monde= gpd.read_file("pays-monde.shp")
type(Monde)
Monde.head()
Out[84]:
FIPS ISO2 ISO3 UN NAME AREA POP2005 REGION SUBREGION LON LAT geometry
0 AC AG ATG 28 Antigua and Barbuda 44 83039 19 29 -61.783 17.078 MULTIPOLYGON (((-61.68667 17.02444, -61.88722 ...
1 AG DZ DZA 12 Algeria 238174 32854159 2 15 2.632 28.163 POLYGON ((2.96361 36.80222, 4.78583 36.89472, ...
2 AJ AZ AZE 31 Azerbaijan 8260 8352021 142 145 47.395 40.430 MULTIPOLYGON (((45.08332 39.76805, 45.81999 39...
3 AL AL ALB 8 Albania 2740 3153731 150 39 20.068 41.143 POLYGON ((19.43621 41.02107, 19.60056 41.79666...
4 AM AM ARM 51 Armenia 2820 3017661 142 145 44.563 40.534 POLYGON ((45.15387 41.19860, 46.00194 40.22555...

Fusion des données

In [85]:
donnee['ISO3']=donnee['Code']
donnee.columns
Monde=pd.merge(Monde,donnee[['Continent','Infections','Decedes','Guerisions','RNB','Pop2018','TauxPopRural', 'TauxSurfRural','TauxInfections', 'TauxDecedes', 'TauxGuerisions', 'TauxRNB','TauxPop2018','ISO3']], how='left', on='ISO3')
Monde1=Monde.dropna()
In [86]:
sns.set_style("white")
ax = Monde1.plot(column='Infections',cmap='Blues',figsize=(15, 15),legend=True, alpha=0.5, edgecolor='k',linewidth=0.4,scheme='Quantiles', k=5)
ax.set_title('Nombres d''infections par le CORONAVIRUS par pays', fontdict = {'fontsize':20}, pad = 12.5) 
#ax.set_axis_off()   Pour supprimer les axes
ax.get_legend().set_bbox_to_anchor((0.2, 0.6))
ax.get_legend().set_title('Legende')
In [87]:
sns.set_style("white")
ax = Monde1.plot(column='Guerisions',cmap='Greens',figsize=(15, 15),legend=True, alpha=0.5, edgecolor='k',linewidth=0.4,scheme='Quantiles', k=5)
ax.set_title('Nombre de guéris du CORONAVIRUS par pays', fontdict = {'fontsize':20}, pad = 12.5) 
#ax.set_axis_off()
ax.get_legend().set_bbox_to_anchor((0.2, 0.6))
ax.get_legend().set_title('Legende')
In [88]:
sns.set_style("white")
ax = Monde1.plot(column='Decedes',cmap='Reds',figsize=(15, 15),legend=True, alpha=0.5, edgecolor='k',linewidth=0.4,scheme='Quantiles', k=5)
ax.set_title('Nombre de décès du CORONAVIRUS par pays', fontdict = {'fontsize':20}, pad = 12.5) 
#ax.set_axis_off()
ax.get_legend().set_bbox_to_anchor((0.2, 0.6))
ax.get_legend().set_title('Legende')

Personnalisation des cartes scheme = 'user_defined', classification_kwds = {'bins':[10, 20, 50, 100, 500, 1000, 5000, 10000, 500000]}

In [89]:
sns.set_style("white")
ax = Monde1.plot(column='Decedes',cmap='GnBu',figsize=(15, 15),legend=True, alpha=0.5, edgecolor='k',linewidth=0.4,scheme = 'user_defined', classification_kwds = {'bins':[10, 20, 50, 100, 500, 1000, 5000, 10000, 500000]})
ax.set_title('Nombre de décès du CORONAVIRUS par pays', fontdict = {'fontsize':20}, pad = 12.5) 
ax.set_axis_off()
ax.get_legend().set_bbox_to_anchor((0.2, 0.6))
ax.get_legend().set_title('Legende')

Carte en gif

In [ ]:
#aggregation par date
InfectionsData = Infections.groupby('Country/Region').sum()
DecesData = Deces.groupby('Country/Region').sum()
GuerisionsData = Guerisions.groupby('Country/Region').sum()

#suppression des colonnes 
InfectionsData =InfectionsData.drop(columns = ['Lat', 'Long'])
DecesData =DecesData.drop(columns = ['Lat', 'Long'])
GuerisionsData =GuerisionsData.drop(columns = ['Lat', 'Long'])

#Rechargement de la carte
Monde= gpd.read_file("pays-monde.shp")

#Changement des noms pour faciliter la jointure
Monde.replace('Viet Nam', 'Vietnam', inplace = True)
Monde.replace('Brunei Darussalam', 'Brunei', inplace = True)
Monde.replace('Cape Verde', 'Cabo Verde', inplace = True)
Monde.replace('Democratic Republic of the Congo', 'Congo (Kinshasa)', inplace = True)
Monde.replace('Congo', 'Congo (Brazzaville)', inplace = True)
Monde.replace('Czech Republic', 'Czechia', inplace = True)
Monde.replace('Swaziland', 'Eswatini', inplace = True)
Monde.replace('Iran (Islamic Republic of)', 'Iran', inplace = True)
Monde.replace('Korea, Republic of', 'Korea, South', inplace = True)
Monde.replace("Lao People's Democratic Republic", 'Laos', inplace = True)
Monde.replace('Libyan Arab Jamahiriya', 'Libya', inplace = True)
Monde.replace('Republic of Moldova', 'Moldova', inplace = True)
Monde.replace('The former Yugoslav Republic of Macedonia', 'North Macedonia', inplace = True)
Monde.replace('Syrian Arab Republic', 'Syria', inplace = True)
Monde.replace('Taiwan', 'Taiwan*', inplace = True)
Monde.replace('United Republic of Tanzania', 'Tanzania', inplace = True)
Monde.replace('United States', 'US', inplace = True)
Monde.replace('Palestine', 'West Bank and Gaza', inplace = True)

#Jointure des tables grace aux noms
mergeInfections = Monde.join(InfectionsData, on = 'NAME', how = 'right')
mergeDeces = Monde.join(DecesData, on = 'NAME', how = 'right')
mergeGuerisions = Monde.join(GuerisionsData, on = 'NAME', how = 'right')

infections.gif

In [ ]:
image_frames = []

#Il faut aciver
#for dates in mergeInfections.columns.to_list()[12:len(mergeInfections.columns)]:
  
    ax = mergeInfections.plot(column = dates, 
                    cmap = 'Blues', 
                    figsize = (15,15), 
                    legend = True,
                    alpha=0.5,
                    scheme = 'user_defined', 
                    classification_kwds = {'bins':[10, 20, 50, 100, 500, 1000, 5000, 10000, 500000]}, 
                    edgecolor = 'black',
                    linewidth = 0.4)
    
    ax.set_title('Nombres d''infections par le CORONAVIRUS par pays: '+ dates, fontdict = 
                 {'fontsize':20}, pad = 12.5)
    
    ax.set_axis_off()
     
    ax.get_legend().set_bbox_to_anchor((0.18, 0.6))
    ax.get_legend().set_title('Legende')
     
    img = ax.get_figure()
    
    
    #f = io.BytesIO()
    img.savefig(f, format = 'png', bbox_inches = 'tight')
    #f.seek(0)
    image_frames.append(PIL.Image.open(f))
In [ ]:
image_frames[0].save('InfectionsPays.gif', format = 'GIF',
            append_images = image_frames[1:], 
            save_all = True, duration = 300, 
            loop = 3)

f.close()
In [90]:
Image('InfectionsPays.gif')
Out[90]:
<IPython.core.display.Image object>
In [91]:
Image('Deces.gif')
Out[91]:
<IPython.core.display.Image object>
In [95]:
Image('Guerision.gif')
Out[95]:
<IPython.core.display.Image object>

Outils de travail

Talend pour nettoyer et rechantionnger certaines données

Spyder (Ipython) pour traiter les données

Jupyter notebook pour la redaction et mise ne page

Veuillez voir les cartes gif en workpace

CONCLUSION

En somme, le projet COVID-19 montre une dépendance de cette pandémie de CORONAVIRUS face aux capacités de résilience des pays

étudiées et la propagation de cette pandémie est spécifiquement variable en fonction de chaque pays.